The Price of Explainability for Clustering

October 25, 2023 (NSH 3305)

Abstract: Given a set of points in $d$-dimensional space, an explainable clustering is one where the clusters are specified by a tree of axis-aligned threshold cuts. Dasgupta et al.\ (ICML 2020) posed the question of the \emph{price of explainability}: the worst-case ratio between the cost of the best explainable clusterings to that of the best clusterings.

We show that the price of explainability for $k$-medians is at most $1+H_{k-1}$; in fact, we show that the popular Random Thresholds algorithm has \emph{exactly} this price of explainability, matching the known lower bound constructions. We complement our tight analysis of this particular algorithm by constructing instances where the price of explainability (using \emph{any} algorithm) is at least $(1-o(1)) \ln k$, showing that our result is best possible, up to lower-order terms. We also improve the price of explainability for the $k$-means problem to $O(k \ln \ln k)$ from the previous $O(k \ln k)$, considerably closing the gap to the lower bounds of $\Omega(k)$.

Finally, we study the algorithmic question of finding the best explainable clustering: We show that explainable $k$-medians and $k$-means cannot be approximated better than $O(\ln k)$, under standard complexity-theoretic conjectures. This essentially settles the approximability of explainable $k$-medians and leaves open the intriguing possibility to get significantly better approximation algorithms for $k$-means than its price of explainability.

This talk is based on a paper by the same name (FOCS'23). Joint work with Anupam Gupta, Ola Svensson, Rachel Yuan.